Abstract
Background Chronic graft-versus-host disease (cGVHD) is a pleomorphic clinical entity that causes significant morbidity and mortality after allogeneic hematopoietic cell transplant (alloHCT). The clinical heterogeneity of cGVHD poses challenges for standardized diagnosis and treatment. While assessment tools such as the NIH cGVHD scoring system are invaluable for enrolling patients in clinical trials and assessing treatment responses, these clinical tools capture only a subset of cGVHD manifestations and rely on clinician interpretation. Unbiased, comprehensive approaches for capturing cGVHD are needed to better reflect the full spectrum of clinical symptoms and to facilitate improved diagnosis, treatment and assessment of response.
Methods To better understand the depth and diversity of cGVHD manifestations, we developed a large language model (LLM) pipeline to extract symptoms and clinical events from unstructured progress notes. The pipeline combines extraction and self-reflection steps with retrieval-augmented generation using the CTCAE term database, enabling accurate extraction of diverse clinical descriptions to a standardized vocabulary. Using data from a de-identified electronic health record (EHR) database, we applied this pipeline to 14,276 outpatient hematology progress notes written for a single-center cohort of 523 alloHCT recipients (median 24 notes per patient; range 1–127) between day +100 and three years post-transplant.
Results Our pipeline identified 52,060 events with a median of 3 events per note and 79 cumulative events per patient. Internal validation of the pipeline against manual annotations by hematologists and hematology fellows demonstrated that the LLM accurately identified events with a recall of 93% and a precision of 55% for events in the same organ system. Lower precision relative to recall reflected extraction of symptoms documented but not clearly attributed to cGHVD in the progress notes. External validation, based on concordance with medication prescriptions, showed that extracted symptoms were associated with a 5.04-fold increase in new prescriptions for appropriate supportive medications compared to the cohort baseline.
Our pipeline identified high rates of known cGVHD manifestations in the cohort with 61% of patients experiencing at least one of eye symptoms, oral involvement, or skin hyperpigmentation. To identify additional cGVHD manifestations, we analyzed which symptoms significantly co-occurred with symptoms specific to cGVHD (i.e. ocular, oral, genitourinary, and sclerotic manifestations). At a false discovery rate of 5% we found symptoms frequently occurring outside of the NIH scoring system included muscle cramps (19%) and nasal symptoms (5%). Our analysis also identified known rare manifestations of cGVHD including pericarditis (1.3%) and capillary leak syndrome (0.9%).
We next studied how these symptoms that are not scorable using the NIH scoring system correlated with response to immunosuppression. From our cohort, we identified 281 progress notes for 143 patients that were written in the two weeks prior to the patient being started on systemic corticosteroids. Of these patients, 58 were steroid refractory/dependent and received a second line immunosuppressant such as ruxolitinib (76% of patients), belumosudil (14%), or ibrutinib (16%). Presence of any non-scorable symptom (i.e. muscle symptom, nasal/sinus symptoms, serositis, or capillary leak syndrome) in the two weeks leading up to the start of corticosteroid therapy occurred more frequently in patients who required additional immunosuppression (11/58, 19.0%) compared those who received prednisone alone (6/85, 7.1%) with an odds ratio of 0.32 (pvalue 0.04).
Conclusions Our initial findings demonstrate that LLMs can be a powerful tool for capturing the clinical heterogeneity of and quantifying the symptom burden of cGVHD in an unbiased manner. This approach enabled the construction of a comprehensive database of events post-transplant. We anticipate that this database will facilitate future investigation into distinct patterns of cGVHD activity, creating well-calibrated metrics of disease severity, and predicting response to immunosuppressive therapy.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal